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8.3.2 Finding Interval Estimators 

Here we would like to discuss how we find interval estimators. Before doing so, let's review a 
simple fact from random variables and their distributions. LetXbe a continuous random 
variable with CDF F^x) = P(X < x). Suppose that we are interested in finding two values x h 

and x i such that 


pix^X^x^ = 1 -a. 

One way to do this, is to chose x ; and x h such that 


Equivalently, 


P(X<x,)=^, and PQ f>x„)=2. 


a a 

F / X l) =2’ and F ^ x h) = 1 “ 2‘ 


-1 


We can rewrite these equations by using the inverse function F x as 


Xl = F x l [^' and 

We call the interval [hy, x h \ a (1 - a) interval for X. Figure 8.2 shows the values of x l and x h 
using the CDF of X, and also using the PDF of X. 




Figure 8.2 - [x { , x h \ is a (1 - a) interval for X, that is, P\x l < x <x h )= 1 - a. 
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Example 8.12 

Let Z ~ /V(0, 1), find Xj and x h such that 

p( x I<Z<Xj\ = 0.95 


• Solution 

o Here, a = 0.05 and the CDF of Z is given by the <D> function. Thus, we can choose 
xj= ® _1 (0.025) = - 1.96, and x h = ® _1 (1 - 0.025) = 1.96 
Thus, for a standard normal random variable Z, we have 

1.96 <Z< 1.96j = 0.95 

More generally, we can find a (1 - a) interval for the standard normal random 
variable. Assume Z ~ M0, 1). Let us define a notation that is commonly used. 
For any p G [0, 1], we define z p as the real value for which 

P{Z>z p )=p. 


Therefore, 


°( Z P = 1 ~P> z p = ° 

By symmetry of the normal distribution, we also conclude 

z i = — z . 

1 ~P P 

Figure 8.3 shows z p and z x _ p = -z on the real line. In MATLAB, to compute z p 
you can use the following command: norminv(l - p). 



Figure 8.3 - By definition, z is the real number, for which we have 

°( Z P = 1 ~P- 
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Now, using the z p notation, we can state a (1 - a) interval for the standard normal 
random variable Z as 



Figure 8.4 shows the (1 - a) interval for the standard normal random variable Z. 



Figure 8.4 - A (1 - a) interval for N(0, 1) distribution. In particular, in 


this figure, we have PZ6 [ - z“, z“ ] = 1 


a. 


2 2 ' 


Now, let's talk about how we can find interval estimators. A general approach is to start with a 

A _ A A 

point estimator 0, such as the MLE, and create the interval 10 /? 0 H around it such that 
p(»* [Mdl > 1 — a. How do we do this? Let's look at an example. 


Example 8.13 

LetXj, X 2 , X 3 , . .., X n be a random sample from a normal distribution N((), 1). Find a 95% 
confidence interval for 0. 


• Solution 

o Let's start with a point estimator 0 for 0. Since 0 is the mean of the distribution, 
we can use the sample mean 


X, +X, + 


+ x„ 


0 =X = 


Since X i ~ N{6, 1) and the X-s are independent, we conclude that 
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X ~ N \ 8 ’n- 


By normalizing X, we conclude that the random variable 


x-e 


l ^n(X- 9) 

s[n 

has a /V(0, 1) distribution. Therefore, by Example 8.12. we conclude 

- 1.96 < y[n(X- 9) < 1.96 j = 0.95 

which is equivalent to (by rearranging the terms) 

/ 1.96 1.96 \ 

P\X- —j=r <9<X+ —j=r = 0.95 
V Am Am / 


Therefore, we can report the interval 


[ 0 /, 0 A ] = 


1.96 1.96 

X-— F ,X+— F r 
yjn a jn 


as our 95% confidence interval for d. 


At first, it might seem that our solution to Example 8.13 is not based on a systematic method. 

You might have asked: "How should I know that I need to work with the normalized A?" 
However, by thinking more deeply about the way we solved this example, we can suggest a 
general method to solve confidence interval problems. The crucial fact about the random 
variable 


x-e 


is that its distribution does not depend on the unknown parameter 9. Thus, we could easily find 

a 95% interval for the random variable A l>i(X- 9 ) that did not depend on 9. Such a random 
variable is called a pivot or a pivotal quantity. Let us define this more precisely. 
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Pivotal Quantit y 

Let Xy,X 2 , X 3 ,..., X n be a random sample from a distribution with a parameter 9 that is to be 

estimated. The random variable Q is said to be a pivot or a pivotal quantity, if it has the 
following properties: 

1. It is a function of the observed data X x ,X 2 , X 3 , . .., X n and the unknown parameter 9, 
but it does not depend on any other unknown parameters: 

Q = Q{X x ,X 2 ,-,X n ,6). 

2. The probability distribution of Q does not depend on 9 or any other unknown 
parameters. 

Example 8.14 

Check that the random variables Qy = X- 9 and Q n = -\Jn(X- 9) are both valid pivots in 
Example 8.13 . 

• Solution 

o We note that (Q | and Q 0 by definitions are functions of X and 6. Since 

- X\+ x 2 +. .. + X n 

X= -, 

n 

we conclude and Q~> are both functions of the observed dataXj, X 2 , X 3 , .. ., 
X n and the unknown parameter 9, and they do not depend on any other unknown 
parameters. Also, 

0! ~ N(0, X -\ Q 2 ~ N( 0, 1). 

Thus, their distributions do not depend on 9 or any other unknown parameters. We 
conclude that Q l and Q 2 are both valid pivots. 


To summarize, here are the steps in the pivotal method for finding confidence intervals: 

1. First, find a pivotal quantity Q{X^,X 2 , •••, X n , 9). 

2. Find an interval for Q such that 

P(q,<Q<q h ) = 1 - a. 
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3. Using algebraic manipulations, convert the above equation to an equation of the form 

P(e, < 0 < 0 4 ) = 1 - a. 


You are probably still not sure how exactly you can perform these steps. The most crucial one 
is the first step. How do we find a pivotal quantity? Luckily, for many important cases that 
appear frequently in practice, statisticians have already found the pivotal quantities, so we can 
use their results directly. In practice, many of the interval estimation problems you encounter 
are of the forms for which general confidence intervals have been found previously. Therefore, 
to solve many confidence interval problems, it suffices to write the problem in a format similar 
to a previously solved problem. As you see more examples, you will feel more confident about 
solving confidence interval problems. 


Example 8.15 

Let A|, X 2 , X 3 ,..., X n be a random sample from a distribution with known variance 
Var(X ; ) = a , and unknown mean EX f = 9. Find a (1 - a) confidence interval for 9. Assume 
that n is large. 

• Solution 

o As usual, to find a confidence interval, we start with a point estimate. Since 
9 = EXj, a natural choice is the sample mean 


X, + X, + 


+ x„ 


x = 


Since n is large, by the Central Limit Theorem (CLT), we conclude that 


x-e 

Q= — 


has approximately N( 0, 1) distribution. In particular, Q is a function of the X-s 

and 9, and its distribution does not depend on 9, or any other unknown 
parameters. Thus, Q is a pivotal quantity. The next step is to find a (1 - a) interval 

the standard normal random 


= 1 - a. 

Therefore, 


for Q. As we saw before, a (1 
variable Q can be stated as 


a) interval for 


P\-zl <Q<z a _ 
1 2 2 
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x-e 

-Z a _ < - < z“ 

2 a 2 


^fn 


1 - a. 


which is equivalent to 


P\X-zl-^= < 0 <X+ 

2 ■yn 2 -yn 


1 - a. 


We conclude that 


X - z“—j=,X + z“—j= 
2 a Jn 2 yn 


is a (1 - a) 100% confidence interval 


for 9. 


The above example is our first important case of known interval estimators, so let's summarize 
what we have shown: 

Assumptions: A random sample X^X 2 , X 3 , ... ,X n is given from a distribution with known 
variance Var(A ( ) = a" < oo; n is large. 

Parameter to be Estimated: 9 = EX^ 


Confidence Interval: 


X-z«-p,X+z“-p 
2 yn 2 yn 


is approximately a (1 - a) 100% confidence 


interval for 9. 

Note that to obtain the above interval, we used the CLT. Thus, what we found is an 
approximate confidence interval. Nevertheless, for large n, the approximation is very good. 


Example 8.16 

An engineer is measuring a quantity 9. It is assumed that there is a random error in each 
measurement, so the engineer will take n measurements and report the average of the 
measurements as the estimated value of 9. Here, n is assumed to be large enough so that the 
central limit theorem applies. IfX ; - is the value that is obtained in the rth measurement, we 

assume that 


X i = 9+W t , 
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where W- is the error in the z'th measurement. We assume that the W- s are i.i.d. with EW t = 0 
and Var(fi+) = 4 units. The engineer reports the average of the measurements 

- x 1 + x 2 + ...+x w 
x= -. 

n 

How many measurements does the engineer need to make until he is 90% sure that the final 
error is less than 0.25 units? In other words, what should the value of n be such that 


P(0-O.25 <X< 0 + 0.25) > .90 ? 


• Solution 

o Note that, here, the X-s are i.i.d. with mean 


EX f = 0 + EWj 
= 0 , 


and variance 


Var(X ; ) = Var( IT •) 

= 4. 

Thus, we can restate the problem using our confidence interval terminology: "Let 
Xj, X 2 , X 3 , ..., X n be a random sample from a distribution with known variance 

Var(X ; ) = er = 4. How large n should be so that the interval 

[X- 0.25, X+ 0.25] 

is a 90% confidence interval for 0 = EXp." 

By our discussion above, the 95% confidence interval for 6 = EX i is given by 

cr o 

X-z2^=,X+z2-p 
2 \n 2 v/n 


Thus, we need 


a 



where a = 2, a = 1 - 0.90 = 0.1. In particular, 
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z« = z 005 = <D _1 (1 - 0.05) = 1.645 


Thus, we need to have 


1.645—i= = 0.25 


We conclude that n > 174 is sufficient. 


Now suppose thatXj, X 2 , X 3 ,.. ., X n is a random sample from a distribution with unknown 
variance Var(X-) = cr 2 . Our goal is to find a 1 - a confidence interval for 9 = EX- r We also 
assume that n is large. By the above discussion, we can say 

/ a o \ 

P X-z2h= < 6<X+z“^ = 1 - a. 

y 2 2 / 

However, there is a problem here. We do not know the value of a. How do we deal with this 
issue? There are two general approaches: we can either find an upper bound for o, or we can 
estimate cr. 

1. An upper bound for cr: Suppose that we can somehow show that 

(J < (J 

u — u max' 


where o max < oo is a real number. Then, if we replace o in 


O O 

X-z2^,X+z2^= 
2 V” 2 V” 


by 


o max , the interval gets bigger. In other words, the interval 


17 max °max 


X-z2^=,X+z2^= 
2 sjn 2 y/n 


is still a valid (1 - a) 100% confidence interval for 9. 

2. Estimate cr 2 : Note that here, since n is large, we should be able to find a relatively good 
estimate for o . After estimating o , we can use that estimate and 


X-z“-p,X+z“-p 
2 y/ M 2 y/ M 


to find an approximate (1 - a) 100% confidence interval for 9. 
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We now provide examples of each approach. 


Example 8.17 

(Public Opinion Polling) We would like to estimate the portion of people who plan to vote for 
Candidate A in an upcoming election. It is assumed that the number of voters is large, and 9 is 
the portion of voters who plan to vote for Candidate A. We define the random variable X as 
follows. A voter is chosen uniformly at random among all voters and we ask her/him: "Do you 
plan to vote for Candidate AT If she/he says "yes," thenX = 1, otherwise X = 0. Then, 

X ~ Bernoulli(9). 

Let A|, A 2 ,X 3 ,..., X n be a random sample from this distribution, which means that the X-s 
are i.i.d. and A) ~ Bernoulli(9). In other words, we randomly select n voters (with 
replacement) and we ask each of them if they plan to vote for Candidate A. Find a 
(1 - a) 100% confidence interval for 9 based onJ 1? X 2 , X 3 ,..., X n . 

• Solution 

o Note that, here, 


EX, = 9. 

Thus, we want to estimate the mean of the distribution. Note also that 

Var(A ( ) = o 2 = 0(\- 9). 

Thus, to find a, we need to know 9. But 9 is the parameter that we would like to 
estimate in the first place. By the above discussion, we know that if we can find 
an upper bound for a, we can use it to build a confidence interval for 9. Luckily, it 
is easy to find an upper bound for a in this problem. More specifically, if you 
define 


M = 9(1-9), for 9 6 [0, 1], 

By taking derivatives, you can show that the maximum value for f{9) is obtained 
1 

at 9 = ^ and that 


m< 



1 

2 


1 

-, for 9 E [0,1]. 


We conclude that 
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1 

17 max 2 

is an upper bound for a. We conclude that the interval 


°max 

X-Z a _ 

2 



® max 

X + z“ — -j=r 
2 -\jn 


1 

is a (1 - a) 100% confidence interval for 6, where o max = j- Thus, 


X- —=,X + 
2^j n 



is a (1 - a) 100% confidence interval for 6. Note that we obtained the interval by 
using the CLT, so it is an approximate interval. Nevertheless, for large n, the 

approximation is very good. Also, since we have used an upper bound for a, this 

1 

confidence interval might be too conservative, specifically if 6 is far from 


The above setting is another important case of known interval estimators, so let's summarize 
it: 

Assumptions: A random sample X,, X 2 ,X 3 ,... ,X n is given from a Bernoulli(6 ); n is large. 
Parameter to be Estimated: 0 


Confidence Interval: 



X- ~^,X+ —p 

2xjn 2yn 


is approximately a (1 - a) 100% confidence 


interval for 0. This is a conservative confidence interval as it is obtained using an upper bound 
for a. 


Example 8.18 

There are two candidates in a presidential election: Candidate A and Candidate B. Let 0 be the 
portion of people who plan to vote for Candidate A. Our goal is to find a confidence interval 
for 6. Specifically, we choose a random sample (with replacement) of n voters and ask them if 
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they plan to vote for Candidate A. Our goal is to estimate the 9 such that the margin of error is 
3 percentage points. Assume a 95% confidence level. That is, we would like to choose n such 
that 


P 


lx-0.03 < <9<X+0.03 


> 0.95, 


where X is the portion of people in our random sample that say they plan to vote for Candidate 
A. How large does n need to be? 

• Solution 

o Based on the above discussion, 


X-p,X + 

2y Jn 



is a valid (1 - a) 100% confidence interval for 6. Therefore, we need to have 


2 



Here, a = 0.05, so z a _ = z 0025 = 1.96. Therefore, we obtain 


\2 x 0.03 j ' 

We conclude n > 1068 is enough. The above calculation provides a reason why 
most polls before elections are conducted with a sample size of around one 
thousand. 


As we mentioned, the above calculation might be a little conservative. Another approach 
would be to estimate a 2 instead of using an upper bound. In this example, the structure of the 
problem suggests a way to estimate a 2 . Specifically, since 

o' 2 = 9{ 1 - 9), 


we may use 
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ff 2 = 0(1-0) 

= X(l-X) 


as an estimate for 6, where 0 = X. The rationale behind this approximation is that since n is 
large, X is likely a good estimate of 0, thus o = 6( 1 - 0) is a good estimate of a . After 


estimating a , we can use 


X-zl^,X+z c i^ 

2 yn 2 yn 


as an approximate (1 - a) 100% confidence 


interval for 6. To summarize, we have the following confidence interval rule: 

Assumptions: A random sample X^X 2 , X 3 , . . ., X n is given from a Bernoulli(6 ); n is large. 


Parameter to be Estimated: 6 


Confidence Interval: 


X-zi 


X(l-X) 


,X+z[ 


X(l-X) 


is approximately a (1 - a) 100% 


confidence interval for 6. 

Again, the above confidence interval is an approximate confidence interval because we used 
two approximations: the CLT and an approximation for a . 


The above scenario is a special case ( Bernoulli{6 )) for which we could come up with a point 
estimator for a 2 . Can we have a more general estimator for o 2 that we can use for any 
distribution? We have already discussed such a point estimator and we called it the sample 
variance: 


S 2 = 


1 


n ~ 1 


n 


k=\ 


1 


n ~ 1 


n " 2 \ 

H X l-nX 

k= 1 


Thus, using the sample variance, S 2 , we can have an estimate for a 2 . If n is large, this estimate 

9 

is likely to be close to the real value of a . So let us summarize this discussion as follows: 
Assumptions: A random sample Xy,X 2 , X 3 ,..., X n is given from a distribution with unknown 

variance Var(A ( ) = a 2 < oo; n is large. 


Parameter to be Estimated: 6 = EXj. 
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Confidence Interval: If S is the sample standard deviation 


1 l " \\ 

i i 

n ~ 2 \ 


n ~ 1 

1 4-»x 
l* 1 / 


then the interval 


5 5 

X-z a _—^,X+z a _~^ 
2 'sjn 2 -\jn 


is approximately a (1 - a) 100% confidence interval for 9. 


Example 8.19 

We have collected a random sample X x , X 2 , X 3 , ..., X 100 from an unknown distribution. The 
sample mean and the sample variance for this random sample are given by 

X= 15.6, S 2 = 8.4 

Construct an approximate 99% confidence interval for 9 = EX^ 

• Solution 

o Here, the interval 


5 5 

X- z“ —p,X + z“ —p 
2 -\jn 2 sjn 


is approximately a (1 - a) 100% confidence interval for 9. Since a = 0.01, we 
have 


~ z 0.005 


2.576 


Using n = 100, X = 15.6, S 1 = 8.4, we obtain the following interval 


15.6-2.576 


-\{8A 

VToo 


15.6 + 2.576 


V8.4 1 
VToo 


[14.85, 16.34], 
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